Show code
library(ggdag)
dag_demo <-
dagify(
W ~ N + p,
L ~ N + p,
coords = time_ordered_coords()
)
ggdag(dag_demo) + theme_dag()
The globe tossing example from chapter two presents an interesting example of causal influence.

Here, N represents the number of times a globe is tossed, with W being the number of times the globe is caught with your finger on water, and L being the catches on land. p represents the proportion of the earth’s surface covered by water.
The DAG says the “the proportion of the earth covered by water and the number of tosses influence the both number of water catches and the number of land catches,” an obvious and profound result.
For each possible explanation of the sample,
Count all the ways the sample could happen.
Explanations with more ways to produce the sample are more plausible.
Statistical Rethinking 2023 - 02 - The Garden of Forking Data (9:22)
Bayesian models learn and adjust their estimates when reading new data:
import matplotlib.pyplot as plt
import numpy as np
import scipy.stats as st
x = np.linspace(0, 1)
fig, axes = plt.subplots(3, 3, figsize=(10, 10))
axes = axes.flatten()
# 1 = Water, 0 = Land
tosses = [1, 0, 0, 1, 0, 1, 1, 1, 1]
label = ""
# Calculate the maximum y-value across all distributions
max_y = 0
for i in range(1, len(tosses) + 1):
W = sum(tosses[0:i])
L = i - W
y_vals = st.beta.pdf(x, W + 1, L + 1)
max_y = max(max_y, np.max(y_vals))
# Add a small margin to the max_y for plotting
max_y *= 1.1
# Reset label for plotting
label = ""
# each iteration represents a learning iteration
for i in range(1, len(tosses) + 1):
if tosses[i - 1] == 0:
label += "L"
else:
label += "W"
W = sum(tosses[0:i])
L = i - W
axes[i - 1].axvline(x=0.71, color="red", linestyle="dashed");
axes[i - 1].plot(x, st.beta.pdf(x, W + 1, L + 1), "b-");
axes[i - 1].set_title(label);
axes[i - 1].set_xlabel("p");
axes[i - 1].set_ylabel("Density");
axes[i - 1].set_ylim(0, max_y);
plt.tight_layout()
plt.show()
Instead of the model “learning” the posterior, you can sample from the posterior directly:
from IPython.display import HTML
import matplotlib.animation as animation
fig, ax = plt.subplots(1, 3, figsize=(10.5, 3.5))
post_preds = []
def hmcmc(frame, W=6, L=3):
# Clear axes before replotting
for ax_ in ax:
ax_.clear()
# Sample from Posterior
sample = st.beta.rvs(W + 1, L + 1)
# Plot 1: The Beta PDF + Current Sample
ax[0].plot(x, st.beta.pdf(x, W + 1, L + 1), "b-")
ax[0].axvline(x=sample, color="red", linestyle="dashed", alpha=0.6)
ax[0].set_title(f"Analytical PDF (Sample: {sample:.2f})")
# Plot 2: Predictive Distribution (Binomial)
pred_samples = st.binom.rvs(n=W + L, p=sample, size=1000)
ax[1].hist(
pred_samples,
bins=np.arange(W + L + 2) - 0.5,
color="skyblue",
edgecolor="black",
)
ax[1].set_title("Predictive Distribution")
ax[1].set_ylim(0, 350)
# Plot 3: Posterior Predictive
post_preds.append(np.median(pred_samples))
ax[2].hist(
post_preds,
bins=np.arange(W + L + 2) - 0.5,
color="orange",
edgecolor="black",
)
ax[2].set_title(f"Posterior Predictive (N={len(post_preds)})")
ax[2].set_ylim(0, 45)
ani = animation.FuncAnimation(fig=fig, func=hmcmc, frames=100, interval=50)
plt.close()
HTML(ani.to_jshtml())
We can model M, our measurement error. In lecture 2, this was modeled as the acceptance criteria for the sampler.
W = 6
L = 3
N = 1000
p = np.repeat(0.0, N)
p[0] = 0.5
p_unadjusted = p.copy()
q0 = st.binom.pmf(W, W + L, p[0])
for i in range(1, N):
# Propose a value
p_new = st.norm.rvs(p[i - 1], 0.1)
# Enforce [0, 1]
while p_new < 0:
p_new = np.abs(p_new)
while p_new > 1:
p_new = 2 - p_new
# Keep unadjusted sample
p_unadjusted[i] = p_new
# Simulate unobserved misclassification rate
q1 = st.binom.pmf(W, W + L, p_new)
if st.uniform.rvs(0, 1) < q1 / q0:
# Accept
p[i] = p_new
q0 = q1
else:
# Reject
p[i] = p[i - 1]
import arviz as az
x = np.linspace(0, 1)
fig, ax = plt.subplots(1, 1)
ax2 = ax.twinx()
ax2.plot(x, st.beta.pdf(x, W + 1, L + 1), "r-", label="Analytical posterior")
az.plot_kde(p, label="Metropolis approximation")
az.plot_kde(
p_unadjusted,
label="Metropolis approximation (unadjusted)",
plot_kwargs={"color": "purple"},
)
fig.tight_layout()
ax.legend()
plt.show()
This plot compares the density estimates for the unadjusted posterior, alongside the “misclassification”-adjusted posterior. Notice the increased resemblance to the analytical posterior and the increased steepness of the adjusted posterior - more precision around the parameter!